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Appl. No. 10/616,718 

Amdt. Dated May 29, 2007 

Reply to Office action of February 1, 2007 

REMARKS 

This is in response to the Office Action dated February 1 , 2007. Claims 1 to 4 are 
unamended. New claim 5 is generally patterned after claim 1 but is amplified to even mote 
clearly distinguish over the cited prior art. 

In the Office Action^ the claims were rejected under 35 U.S.C. § 103(a) as being 
unpatentable over the Kargiipta patent in view of the Kamath patent and further in view of the 
Cho article. 

First, the undersigned would like to thank examiner Daye and her supervisor Sana Al- 
Hashemi for the; coiutesies extended during an interview on April 26, 2007. In attendance were 
the undersigned, the applicant, Jerzy Bala, and Ali Hadjarian, an employee of applicant's 
assignee. At the interview, Drs. Bala and Hadjarian explained how the prior art fails to disclose 
the limitations in claim 1 , utilizmg the enclosed Power Point. The examiners seem to agree that 
tiie cited prior ajtt did not disclose "beginning attribute selection'" and ''selecting a winning 
agent," ^d said they would consider those limitations further. They disagreed with many of the 
other positions advanced by Drs. Bala and Hadjarian. 

The Office Action sets forth how each limitation of the claims is met by certain ones of 

the thrcife items of prior art. We respectfiJly disagree as set forth below. The reasons axe set 

forth below under headings corresponding to the various limitations in claim 5. 

Claim limitatioB: beginning attribute selection by each agent, wherein attribute seiectioii of 
one data attribute from a set of local data attributes unique to the respective agent such 
tiiat the selected data attribute has the substantially highest local information gain valiie 
ambbgall attribute^ 

The Office Action states that Kargupta discloses beginning attribute selection, citing col. 
3, lines 20-27, vrtiich states: 
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"Given a set of observed feature values, the task is to learn a function that 
computes the unknown value of a desired feature as a function of other observed 
features. The given set of observed feature values is sometimes called the training 
data set. In FIG. 1 the col. for /denotes the feature value to be predicted; xi, X2, 
X3, X4, xs, X6 and X7 denote the features that are used to predict 

Here Kargupta ts pointing to the estimation of an unknown feature value based on the 

value of the knovvn features. This does not constitute "attribiite selection by each agent.** In 

other words, there is no equivalency between attribute "estimation" (i.e.. computing the unknown 

value of an attribute) and attribute "selection" (i.e. picking the best attribute for the classification 

task at hand). 

We submit that attribute/feature selection can be considered as a pre-processing step prior 
to function value estimation/prediction. In other words, the objective of attribute selection is to 
find the most meaningful attribute for function value estimation/prediction. An example of 
function valtie estimation/prediction is predicting a patient's cancer risk (function) based on the . 
attributes such as sex, smoking habit, age and salary. An example of attribute selection is the 
precursor step of deciding that smoking habit is the most discriminatory attribute for predicting a 
patient's cancer risk as it has the highest infonnation gain value. 

Claim limitation: cpttecting the highest informatian gain values from theplul^lity of agents 
by the mediator, wherein the highiest information gain value of a respective agent is based 
on its own local data with its own uniqne datd attribute$ 

The Office Action states that Kargupta discloses passing a best attribute, citing col. 13, 

lines 18-27, which states: 

*'We can express Boolean decision tree as a function /X"—^{0, 1 }. The function/ 
maps positive and negative instances to one and zero respectively. A node in a 
tree is labeled with a feature Xi. A downward link from the node Xi, is labeled with 
an attribute value of i-th feature. A path from the root node to a successor node 
represents the subset of data that satisfies the different feature values labeled 
along the path. These data subsets are essentially similarity based equivalence 
Classes and we shall call them schemata (schema in sii^lar form), if h is a 
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schema, then h€{0, 1, vAcre * denotes a wild-card that matches any value of 
the corresponding feature,^* 

Here, Kargupta is providing the reader with a high level description of the 1D3 algoritiim. 
ID3, defveloped by Ross Quiiilan, is arguably the most popular algorithm for decision tree 
constiijctibn. The ^ole motivation behind the invention of claim 5 has been to come up with a 
distributed version of the ID3 algorithm. 

Kargupta does not disclose passing a best attribute from each of said plurality of agents to 
saidniediator. Tlie patent merely describes the standard centralized approach to d^^^ 
construction. 

We submit that Kargupta discloses the centralized approach to decision tree construction 
where there are only global attributes and globally optimum information gain values, whereas the 
invention of claim 5 involves a plurality of agents each with its own local attributes and locally 
optimum information gain values. The information on each agent's highest information gain 
value is collected by the mediator for comparison purposes. 

Claim Hmitation: selecting by the m^iator of a winning agent, wherein the winning agent 
is the raly £^ent from the jplarality of agents with access to the local data attribute with the 
highest global iBfotim^^ gain vttlue 

the Office Action states that Kamath discloses selecting a winning agent from said 

plurality of agents, citing col. 14» lines 9-19, which states: 

''Each processor evaluates each of the local feature lists to find the best local split 
(this is done in parallel by all processors). 

^'It communicates the local best splits and count statistics to all processors. 

"Each i»rocessor determines the best global split (this is done in parallel by all 
processors). 

"Split the Data. Each processor splits on the winning feature, and sends the ID 
numbers of its- new left and right node data instances to all other processors." 
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Here» Kamath is not disclosing selecting a winning agent from said plurality of agents. 
He is merely explaining the process of selecting a winning feature, which 'Us done in parallel by 
all processors, (emphasis added) This is a significant algorithmic difference. 

We submit. that in Kamath's approach, all processors (agents) have access to the winiung 
feature (attribute). In the invention of claim 5, only a single agent has access to the winning 
. attribute. 

Claim fiUiitatiim: biiii»tiiig data splitting by said winning agent based oii the valiie of tbe 
data attribute wifh the hi^est mfortnation gain wherein the specified data attribute is 
unique to the respective agent^s local data 

The Office Action states that Kamath discloses initiating data splitting, citing col. 13, 
lines 56-60 by a winning agent, citing col. 14, lines 9-19. The cited passages are: 
^'The creation of the tree is thus split into two parts: 
"(1) Initial Sorting 

"First the training set is split into separate feature lists for each feature. Each list 
contains the identification (ID) number of the data instance, the feature value, and 
the class associeited with the instance. This data is partitioned uniformly among 
the processors " 

"Each processor evaluates each of the local feature lists to find the best local split 
■ (tiiis is done in parallel by all processors). 

'"It conununicates the local best splits and count statistics to all processors. 

"'Each processor determines the best global split (this is done in parallel by all 
processors). 

"Split the l>ata- Each processor splits on fte winnii^ feature, and sends the ID 
numbers of its new left and right node data instances to all other processors/* 

Kamath does not disclose data splitting by the winning agent Instead, the patent 
discloses splitting by all agents (or processors) based on a winning feature. According to 
Kamath,.th]S is a process that is done in parallel by all processors. Tl\is is algorithmically quite 
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difTerent from the invention of claim 5 ivhere initial splitting is only performed by the winning 
agent (i.e. the only agent with access to the data attribute with the highest information gain). 
Since non-winning ageiits do not have access to the corresponding feature, data splitting by them 
vAU not be feasible. 

Claim Umiiation: forwarding split data index information resulting from said data splitting 
by said wiooiog ageiit to sidd mediator 

The Office: Action states thait Kamath discloses forwarding split data index information 
resulting from said data splitting by said wiiming agent to said mediator, citing the above-quoted 
text in col. 14, lines 9-19. 

We submit that Kamath does not disclose forwarding split data index information 
resulting from data splitting by the winning agent. Instead, Kamath discloses the forwarding of 
all split data index information by all the agents (or processors). This is a significant algorithmic 
difference. 

Claini liinitatidn: initiatiiig data Slotting by each of said plurality of agents other than said 
[ wuiidihig agent based tfh the spfit data ibdex information furnished by the winning agent 
and broadcasted by the mediator 

The Office Action states that Kamath discloses initiating data splitting by each of said 

plurality of agents other than said winning agent, citing coL 14, lines 17-26, which states: 

"Split the Data. Each processor splits on the winning feature, and sends the ID 
numbers of its new left and right node data instances to all other processors. 

"Then, each processor builds a hash table containing all the ID numbers, and 
information on which instances belong to which decision tree node. 

*'Next, each processor, for each feature, probes the hash table for each ID number 
to detemiine how to split that feature value. 

"This process is carried out on the next unsolved decision tree." 
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Again, iCamath does not disclose initiating data splitting by each of said plurality of 
agents "other thain said winning agent," In fact, Kamath clearly states that data splitting is 
performed by all the agents (or processors). As stated previously we submit that such splitting 
by all processors is not feasible in the invention of claim S, as only one agent has access to the 
data attribute with tiie highest information gain. 

Claim limitation: generating and saving partial rules by repeating the attribute selection 
and data splitting process recursfvety and by tracking the attribute/split information 
coming from that iteration's winning aigent 

The examiner states that Cho discloses generating and saving partial rules, citing p. 2, 
lines 14-18, which states: 

"Besides devising a new technique named 'fragmentation approach' in this paper, 
we also investigate various technical details, such as how large a data set in each 
fragment (local data set) is adequate, how many rules are to be generated in each 
fragment, and the corresponding selection criterion by which to choose the 
generated rules from the fragments to form a global rule set." 

Cho does not disclose generating and saving partial rules. Instead, the article is referring 
to the process of generating whole rules at each fragment and then using them to fotm a global 
rule set. In contrast, the invention of claim 5 does not advocate constructing whole rules at each 
agent location. Instead, the invention of claim 5 constructs rule "conditions*' at each location. 

For example consider the following two rules: 

A & B -> CI (Rule 1) 

C&D->C3(Rule 2) 

Cho deals with generating all of Rule 1 at one fragment and all of Rule 2 at another and 
then combining them to get the rule set. 
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Instead, the invention of claim 5 would generate A at one location, and B at another 
location, so that combining them through a mediator would finally reveal the whole rule: A & B 
->CL 

Cbiiil Kiilitatio&r outpiittm^^ complete rules obtained at the completioB of tbe miximg 
process to said plurality of agents 

The examiner states that Cho discloses outputting complete rules to said plurality of 

agents, citing p. 4, lines 24-25, which states: 

"Another strategy is to loosely account on an number of different inductive 
learoing algorithms by integrating their collective output concepts/' 

Cho does not disclose outputting complete rules to said plurality of agents, Instead, the 
article is referring to the integration of the outputs of a number of different learning algorithms. 
This deals yvith the topic of '^multiple classifier" approaches, which is a whole discipline unto 
itself and does hot have anything to do \yith the invention of claim S. 

Accordingly, it is submitted that a number of limitations of claim 5 are not met by the 
cited prior art. Accordingly, claim 5 is patentable over the cited art. Claims 1-4 include 
generally the same limitations as claim 5, and are therefore patentable for the same reasons. It is 
therefore subihitted that this application is in condition for allowance and such action is 
requiested. 

Respeciftilly submitted, 
Harold V. Stotland 
Reg. No. 24,492 
Seyfarth Shaw 
Attorneys for Assignee 
55 East Monroe Street 
Suite 4200 
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